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Abstract We propose an efficient framework for en- 
abling secure multi-party numerical computations in a 
Peer-to-Peer network. This problem arises in a range of 
applications such as collaborative filtering, distributed 
computation of trust and reputation, monitoring and 
other tasks, where the computing nodes is expected to 
preserve the privacy of their inputs while performing a 
joint computation of a certain function. 

Although there is a rich literature in the field of 
distributed systems security concerning secure multi- 
party computation, in practice it is hard to deploy those 
methods in very large scale Peer-to-Peer networks. In 
this work, we try to bridge the gap between theoretical 
algorithms in the security domain, and a practical Peer- 
to-Peer deployment. 

We consider two security models. The first is the 
semi-honest model where peers correctly follow the 
protocol, but try to reveal private information. We pro- 
vide three possible schemes for secure multi-party nu- 
merical computation for this model and identify a sin- 
gle light-weight scheme which outperforms the others. 
Using extensive simulation results over real Internet 
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topologies, we demonstrate that our scheme is scalable 
to very large networks, with up to millions of nodes. 

The second model we consider is the malicious 
peers model, where peers can behave arbitrarily, delib- 
erately trying to affect the results of the computation 
as well as compromising the privacy of other peers. 
For this model we provide a fourth scheme to defend 
the execution of the computation against the malicious 
peers. The proposed scheme has a higher complex- 
ity relative to the semi-honest model. Overall, we pro- 
vide the Peer-to-Peer network designer a set of tools to 
choose from, based on the desired level of security. 

1 Introduction 

We consider the problem of performing a joint numer- 
ical computation of some function over a Peer-to-Peer 
network. Such problems arise in many applications, 
for example, when computing distributively trust [22 1, 
ranking of nodes and data items [12|, clustering [5], 
collaborative filtering [6 35], factor analysis lfl4l etc. 
The aim of secure multi-party computation is to enable 
parties to carry out such distributed computing tasks in 
a secure manner. Whereas distributed computing clas- 
sically deals with questions of computing under the 
threat of machine crashes and other inadvertent faults, 
secure multi-party computation is concerned with the 
possibility of deliberate malicious behavior by some 
adversarial entity. That is, it is assumed that a proto- 
col execution may come under attack by an external 
entity, or even by a subset of the participating parties. 
The aim of this attack may be to learn private infor- 
mation or cause the result of the computation to be 
incorrect. Thus, two central requirements on any se- 
cure computation protocol are privacy and correctness. 
The privacy requirement states that nothing should be 
learned beyond what is absolutely necessary; more ex- 
actly, parties should learn their designated output and 
nothing else. The correctness requirement states that 



each party should receive its correct output. Therefore, 
the adversary must not be able to cause the result of 
the computation to deviate from the function that the 
parties had set out to compute. 

In this paper, we consider only functions which are 
built using the algebraic primitives of addition, sub- 
straction and multiplication. In particular, we focus on 
numerical methods which are computed distributively 
in a Peer-to-Peer network, where in each iteration, ev- 
ery node interacts with a subset of its neighbors by 
sending scalar messages, and computing a weighted 
sum of the messages that it receives. Examples of such 
functions are belief propagation [28 1, EM (expecta- 
tion maximization) |14|, Power method ll22l . separa- 
ble functions ll26ll . gradient descent methods [ 33 1 and 
linear iterative algorithms for solving systems of lin- 
ear equations iflOl . As a specific example, we describe 
the Jacobi algorithm for computing such functions in 
detail in Section|7] 

There is a rich body of research on secure com- 
putation, starting with the seminal work of Yao [ 34 1 . 
Part of this research is concerned with the design of 
generic secure protocols that can be used for comput- 
ing any function (for example, Yao's work |34] for the 
case of two participants, and e.g. [9 21 1 for solutions 
for the case of multiple participants). There are sev- 
eral works concerning the implementation of generic 
protocols for secure computation. For example, Fair- 
Play [25 1 is a system for secure two-party computa- 
tion, and FairPlayMP Q is a different system for se- 
cure computation by more than two parties. These two 
systems are based (like Yao's protocol) on reducing 
any function to a representation as a Boolean circuit 
and computing the resulting Boolean circuit securely. 
Our approach is much more efficient, at the cost of sup- 
porting only a subset of the functions the FairPlay sys- 
tem can compute. 

A different line of work studies secure protocols 
for computing specific functions (rather than generic 
protocols for computing any function). Of particular 
interest for us are works that add a privacy preserving 
layer to the computation of functions such as the fac- 
tor analysis learning problem (for which [ 14 1 describes 
a secure multi-party protocol using homomorphic en- 
cryption), computing trust in a Peer-to-Peer network 
(for which [22 J suggests a solution using a trusted third 
party), or the work of 1331 . which is closely related to 
our work, but is limited to two parties. 

Most previous solutions for secure multi-party com- 
putation suffer from one of the following drawbacks: 

( 1 ) they provide a centralized solution where all infor- 
mation is shipped to a single computing node, and/or 

(2) require communication between all participants in 



the protocol, and/or (3) require the use of asymmet- 
ric encryption, which is costly. In this work, we in- 
vestigate secure computation in a Peer-to-Peer setting, 
where each node is only connected to some of the other 
nodes (its neighbors). We examine different possible 
distributed approaches, and out of the them we identify 
a single approach, which is theoretically secure and at 
the same time efficient and scalable. 

Security is often based on the assumption that there 
is an upper bound on the global number of malicious 
participants. In our setting, we consider the number of 
malicious nodes in each local vicinity. Furthermore, 
most of the existing algorithms scale to tens or hun- 
dreds of nodes, at the most. In this work, we address 
the problem in a setting of a large Peer-to-Peer net- 
work, with millions of nodes and hundreds of millions 
of communication links. Unlike most of the previous 
work, we have performed a very large scale simula- 
tion, using real Internet topologies, demonstrating that 
our approach is applicable to real network settings. 

As an example for applications of our framework, 
we take the neighborhood based collaborative filter- 
ing 0. This algorithm is a recent state-of-the-art al- 
gorithm. There are two challenges in adapting this al- 
gorithm to a Peer-to-Peer network. First, the algorithm 
is centralized and we propose a method to distribute 
it. Second, we add a privacy preserving layer, so no 
information about personal ranking is revealed during 
the process of computation. 

The paper is organized as follows. In Section[2]we 
formulate our problem model. In Section [3] we give a 
brief background of cryptographic primitives that are 
used in our schemes. Section|4]outlines our novel con- 
struction for the semi-honest model. In Section [5] we 
review cryptographic primitives needed for extending 
our construction to support the malicious adversary model. 
Section [6] presents our extended construction for the 
malicious adversary model. Example collaborative fil- 
tering application is given in Section [7] Large scale 
simulations are presented in Section [8] We conclude 
in Section|9] 

We use the following notations: T stands for a vec- 
tor or matrix transpose, the symbols and de- 
note entries of a vector and matrix, respectively. TVj is 
the set of neighboring nodes to node i. The spectral 
radius p(B) = maxi<j< s (|Aj|), where Ai, . . . , X s are 
the eigenvalues of a matrix B. 

2 Our Model 

Given a Peer-to-Peer network graph G = (V, E) with 
\V\ = n nodes and \E\ = e edges, we would like 
to perform a joint iterative computation. Each node i 
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starts with a sc state e M, and on each round 
sends messages to a subset of its neighbors. We denote 
a message sent from node i to node j at round r as mL . 

Let A/j denote the set of neighboring nodes of i. 
Denote the neighbors of node i as n.^ , 7ij 2 , . . . , rii k , 
where k = \Ni\. We assume, wlog, that each node 
sends a message to a subset of its neighbors (possibly 
including itself.) On each round r = 1, 2, • • • , node i 
computes, based on the messages it received, a func- 
tion /: R k+1 -> R k+1 , 

Namely, the function gets as input the initial state 
(which is denoted as a self message nii i) and all the 
received neighbor messages of this round, and outputs 
a new state and messages to be sent to a subset of the 
neighbors at the next round. The iterative algorithms 
run either a predetermined number of rounds, or until 
convergence is detected locally. Whenever the refer- 
ence to the round number is clear from the text, the 
round numbers are omitted to simplify our notations. 

In this paper, we are only interested in functions 
/ that compute weighted sums on each iteration. Next 
we show that there is a variety of such numerical meth- 
ods. Our goal is to add a privacy preserving layer to 
the distributed computation, such that the only infor- 
mation learned by a node is its share of the output. 

In Section |4] we use the semi-honest adversaries 
model: in this model (common in cryptographic re- 
search of secure computation) even corrupted parties 
are assumed to correctly follow the protocol specifica- 
tion. However, the adversary obtains the internal states 
of all the corrupted parties (including the transcript of 
all the messages received), and attempts to use this in- 
formation to learn information that should remain pri- 
vate. 

Security against semi-honest adversaries might be 
justified if the parties participating in the protocol are 
somewhat trusted, or if we trust the participating par- 
ties at the time they execute the protocol, but suspect 
that at a later time an adversary might corrupt them and 
get hold of the transcript of the information received in 
the protocol. 

Section [6] extends our construction to the "mali- 
cious adversary", which can behave arbitrarily. We note 
that protocols secure against malicious adversaries are 
considerably more costly than their semi-honest coun- 
terparts. For example, the generic method of obtain- 
ing security against malicious adversaries is through 
the GMW compiler [21] which adds a zero-knowledge 
proof for every step of the protocol. 



1 An extension to the vector case is immediate, we omit it for 
the clarify of description. 



We define a configurable local system parameter 
di, where di — 1 is the maximum number of nodes in 
the local vicinity of node i (direct neighbors of node i) 
that might be corrupted. Whenever this assertion is vi- 
olated, the security of our proposed scheme is affected. 
This is a stronger requirement from our system, rela- 
tive to the traditional global bound on the number of 
adversarial nodes. 



3 Cryptographic primitives for the semi-host 
model 

We compare several existing approaches from the lit- 
erature of secure multi-party computation and discuss 
their relevance to Peer-to-Peer networks. 



3 . 1 Random perturbations 

The random additive perturbation method attempts to 
preserve the privacy of the data by modifying values 
of the sensitive attributes using a randomized process 
(see |4 15 18 1). In this approach, the node sends a value 
u. L + v, where Ui is the original scalar message, and v 
is a random value drawn from a certain distribution 
V. In order to perturb the data, n independent sam- 
ples v i, V2, ■ ■ ■ , v n , are drawn from a distribution V. 
The owners of the data provide the perturbed values 
u\ + V\ , U2 + 1>2) • • • ,u n + v n and the cumulative dis- 
tribution function FV(r) of V . The goal is to use these 
values, instead of the original ones, in the computation. 
(It is easy to see, for example, that if the expected value 
of V is 0, then the expectation of the sum of the Ui + vi 
values is equal to the expectation of the Uj values.) The 
hope is that by adding random noise to the individual 
data points it is possible to hide the individual values. 

The random perturbation model is limited. It sup- 
ports only addition operations, and it was shown in IfTSI 
that this approach can ensure very limited privacy guar- 
antees. We only present this method for comparing its 
running time with the other protocols. 

3.2 Shamir's Secret Sharing (SSS) 

Secret sharing is a fundamental primitive of crypto- 
graphic protocols. We will describe the secret sharing 
scheme of Shamir l30l . The scheme works over a field 
F, and it is assumed that the secret s is an element in 
that field. In a fc-out-of-n secret sharing the owner of a 
secret wishes to distribute it among n players such that 
any subset of at least k of them is able to recover the 
secret, while no subset of up to k — 1 players is able to 
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learn any information about the secret. (In the applica- 
tion described in this paper each player will be a node 
in the network.) 

In order to distribute the secret, its owner chooses a 
random polynomial P() of degree k — 1, subject to the 
constraint that P(0) = s. This is done by choosing ran- 
dom coefficients 01, . . . , ak-i and defining the poly- 
nomial as P(x) = s + ^2iZi a*iX z - Each player is asso- 
ciated with an identity in the field (denoted X\, . . . , x n 
for players 1, . . . , n, respectively). The share that player 
i receives is the value P(xi), namely the value of the 
polynomial evaluated at the point xi. It is easy to see 
that any k players can recover the secret, since they 
have k values of the polynomial and can therefore in- 
terpolate it and compute its free coefficient s. It is also 
not hard to see that any set of up to k — 1 players does 
not learn any information about s, since any value of s 
has a probability of 1 / 1 F of resulting in a polynomial 
which agrees with the values that the players have. 



3.3 Homomorphic encryption 

A homomorphic encryption scheme is an encryption 
scheme that allows certain algebraic operations to be 
carried out on the encrypted plaintext, by applying an 
efficient operation to the corresponding ciphertext (with- 
out knowing the decryption key!). In particular, we 
will be interested in additively homomorphic encryp- 
tion schemes: Here, the message space is a ring (or a 
field). There exists an efficient algorithm + p k whose 
input is the public key of the encryption scheme and 
two ciphertexts, and whose output is E p k{m\) + p k 
E p k(ni2) — E p k(mi + ma). (Namely, this algorithm 
computes, given the public key and two ciphertexts, 
the encryption of the sum of the plaintexts of two ci- 
phertexts.) There is also an efficient algorithm - p k, whose 
input consists of the public key of the encryption scheme, 
a ciphertext, and a constant c in the ring, and whose 
output is c - pk Ep k (m) = E pk (c ■ m). 

We will also require that the encryption scheme 
has semantic security. An efficient implementation of 
an additive homomorphic encryption scheme with se- 
mantic security was given by Paillier (27). In this cryp- 
tosystem the encryption of a plaintext from [1; N], where 
N is a RS A modulus, requires two exponentiations mod- 
ulo iV 2 . Decryption requires a single exponentiation. 
We will use this encryption scheme in our work. 

3.3.1 Paillier encryption 

We describe in a nutshell the Paillier cryptosystem. 
Fuller details are found on ll27ll . 



- Key generation Generate two large primes p and 
q. The secret key skis X = lcrn(p — 1, q — 1). The 
public key pk includes N = pq and g G l^i such 
that g = 1 mod N. 

- Encryption Encrypt a message m G "Ln with ran- 
domness r G Z* N 2 and public key pk as c = g m r N 
mod N 2 . 

- Decryption Decrypt a ciphertext c G Z* N2 . De- 
cryption is done using: ^ "od jv 2 ) mod N where 
L(x) = (x- 1)/N. 



4 Our construction 

The main observation we make is that numerous dis- 
tributed numerical methods compute in each node a 
weighted sum of scalars rriji, received from neighbor- 
ing nodes, namely 

ajiiriji , (1) 

j em 

where the weight coefficients aj% are known constants. 
This simple building block captures the behavior of 
multiple numerical methods. By showing ways to com- 
pute this weighted sum securely, our framework can 
support many of those numerical methods. In this sec- 
tion we introduce three possible approaches for per- 
forming the weighted sum computation. 

In Section l7TT1 we give an example of the Jacobi al- 
gorithm which computes such a weighted sum on each 
iteration. 



4.1 A Construction Based on Random Perturbations 

In each iteration of the algorithm, whenever a node j 
needs to send a value rriji to a neighboring node i, the 
node j generates a random number Tji using the GMP 
library [lj, from a probability distribution with zero 
mean. It then sends the value rriji + rji to the other 
node i. As the number of neighbors increases, the com- 
puted noisy sum J2jeN- ( m ji + r ji) converges to the 
actual sum J2jeN t m n- 

When the node % computes a weighted sum of the 
messages it received as in equation[T] it multiplies each 
incoming message by the corresponding weight. The 
computed noisy sum YljeN- a i i ( m ji + r ji) converges 
to the actual sum YljeN- a ji m ji- 

We note again that this method is considered mainly 
for a comparison of its running time with those of the 
other methods. 
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4.2 A Construction Based on Homomorphic 
Encryption 

We chose to utilize the Paillier encryption scheme, which 
is an efficient realization of an additive homomorphic 
encryption scheme with semantic security. 

Key generation: We use the threshold version of the 
Paillier encryption scheme described in ll20ll . In this 
scheme, a trusted third party generates for each node 
i private and public key pairs|3 The public key is dis- 
seminated to all of node i neighbors. The private key 
Xi = prvk(i) is kept secret from all nodes (including 
node i). Instead, it is split, using secret sharing, to the 
neighbors of node i. There is a threshold di, which is at 
most equal to | Ni |, the number of neighbors of node i. 
The scheme ensures that any subset of di of the neigh- 
bors of node i can help it decrypt messages (without 
the neighbors learning the decrypted message, or node 
i learning the private key). If di — \Ni\ then the pri- 
vate key is shared by giving each neighbor j a random 
value Sji subject to the constraint J2jeN- s ji = = 
prvk(i). Otherwise, if di < \Ni\ the values sji are 
shares of a Shamir secret sharing of A;. Note that fewer 
than di neighbors cannot recover the key. 

Using this method, all neighboring nodes of node 
i can send encrypted messages using pubk(i) to node 
i, while node i cannot decrypt any of these messages. 
It can, however, aggregate the messages using the ho- 
momorphic property and ask a coalition of di or more 
neighbors to help it in decrypting the sum. 

The initialization step of this protocol is as follows: 

[HO] The third party creates for node i a public and pri- 
vate key pair, \pubk(i),prvk(i)]. It sends the pub- 
lic key pubk(i) to all of node i's neighbors, and 
splits the private key into shares, such that each 
node i neighbors gets a share Sjj. If di — |iV,| 
then prvk(i) = Aj = J2jeN- s ji- Otherwise the 
Sji values are Shamir shares of the private key. 

One round of computation: In each round of the al- 
gorithm, when a node j would like to send a scalar 
value rriji to node i it does the following: 

[HI] Encrypt the message rriji, using node i public key 

to get Cji = E, pubk{t) {mji). 
[H2] Send the result Cji to node i. 



1 It is also possible to generate the key in a distributed way, 
without using any trusted party. This option is less efficient. We 
show that eventhough the usage of a centralized key generation 
process is not efficient enough, and therefore we have not imple- 
mented the distributed version of this protocol. 



[H3] Node i aggregates all the incoming message Cji, 
using the homomorphic property to get 

E P ubk(i) (J2 ajirriji). 

After receiving all messages: Node z's neighbors as- 
sist it in decrypting the result Xi, without revealing the 
private key prvk(i). This is done as follows (for the 
case di = \Ni\): Recall that in a Paillier decryption 
node i needs to raise the result computed in [H3] to 
the power of its private key . 

[H4] Node i sends all its neighbors the result computed 

in [H3]: Q = E pubk ( i) {Y,aj l mj l ). 
[H5] Each neighbor, computes a part of the decryption 

Wji — C\ ]l where Sji are node i private key shares 

computed in step [HO], and sends the result Wji to 

node i. 

[H6] Node i multiplies all the received values to get: 

//,.Y«'.,„ cf * Sl '=c^ = 

= ajirriji mod N. 

If di < \Ni\ then the reconstruction is done using 
Lagrange interpolation in the exponent, where node i 
needs to raise each Wji value by the corresponding La- 
grange coefficient, and then multiply the results. 

Regarding message overhead, first we need to gen- 
erate and disseminate public and private keys. This op- 
eration requires 2e messages, where e = \E\ is the 
number of graph edges. In each iteration we send the 
same number of messages as in the original numerical 
algorithm. However, assuming a security of i bits, and 
a working precision of d bits, we increase the size of 
the message by a factor of 4. Finally, we add e mes- 
sages for obtaining the private keys parts in step [H4]. 

Regarding computation overhead, for each message 
sent, we need to perform one Paillier encryption in step 
[HI]. In step [H3] the destination node performs addi- 
tional k — 1 multiplications, and one decryption in step 
[H4]. At the key generation phase, we add generation 
of n random polynomial and their evaluation. In step 
[H4] we compute an extrapolation of those n polyno- 
mials. The security of the Paillier encryption is inves- 
tigated in [27 20], where it was shown that the system 
provides semantic security. 

4.3 A Construction Based on Shamir Secret Sharing 

We propose a construction based on Shamir's secret 
sharing, which avoids the computation cost of asym- 
metric encryption. In a nutshell, we use the neighbor- 
hood of a node for adding a privacy preserving mech- 
anism, where only a coalition of di or more nodes can 
reveal the content of messages sent to that node. 
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In each round of the algorithm, when a node j would 
like to send a scalar value rriji to node i it does the fol- 
lowing: 

[SI] Generate a random polynomial Pji of degree di — 
1, of the type Pji{x) — rriji + X^il a l xl ■ 

[S2] For each neighbor I of node i, create a share Cjn of 
the polynomial Pji (x) by evaluating it on a single 
point xi, namely Cju = Pji(xi). 

[S3] Send Cju to neighbor node / of node i. 

[S4] Each neighbor I of node i aggregates the shares it 
received from all neighbors of node i and computes 
the value Su = X^gjv- djiCjii- (Note that the re- 
sult of this computation is equal to the value of a 
polynomial of degree di — 1, whose free coefficient 
is equal to the weighted sum of all messages sent 
to node i by its neighbors.) 

[S5] Each neighbor I sends the sum Su to node i, 

[S6] Node i treats the value received from node I as a 
value of a polynomial of degree di — 1 evaluated 
at the point x,. Node i interpolates Pi{x) for ex- 
tracting the free coefficient, which in this case is 
the weighted sum of all messages X^eAr djitriji. 

Note that the message rriji sent by node j remains 
hidden if less than neighbors of i collude to learn it 
(this is ensured since these neighbors learn strictly less 
than di values of a polynomial of degree di — 1). The 
protocol requires each node j to send messages to all 
other neighbors of each of its neighbors. We discuss 
the applicability of this requirement in Section [9] 

4.4 Extending the method to support multiplication 

Assume that node i needs to compute the multiplica- 
tion of the values of two messages that it receives from 
nodes j and j'. The Shamir secret sharing scheme 
can be extended to support multiplication using the 
construction of Ben-Or, Goldwasser and Wigderson, 
whose details appear in [9 |. This requires two changes 
to the basic protocol. First, the degree of the polynomi- 
als must be strictly less than |iVj|/2, where |iVj| is the 
number of neighbors of the node receiving the mes- 
sages. (This means, in particular, that security is now 
only guaranteed as long as less than half of the neigh- 
bors collude.) In addition, the neighboring nodes must 
exchange a single round of messages after receiving 
the messages from nodes j and j 1 . We have not imple- 
mented this variant of the protocol. 

4.5 Working in different fields 

The operations that can be applied to secrets in the 
Shamir secret sharing scheme, or to encrypted values 




(a) (b) 




(c) (d) 

Fig. 1 Schematic message flow in the proposed methods. The 
task of node i is to compute the sum of all messages: + 
rriji + ran ( a ) describes a message sent from j to i using ran- 
dom perturbation, (b) describes steps [S3] in our SSS scheme, 
where the same message mji is split into shares sent to all of 
i neighbors, (c) describes steps [S4] in our SSS scheme, where 
shares destined to i are aggregated by its neighbors, (d) shows 
steps [H6] in our SSS scheme, which is equivalent (in term of 
message flow) to step [H2] in our homomorphic scheme. 

in a homomorphic encryption scheme, are defined in 
a finite field or ring over which the schemes are de- 
fined (for example, in the secret sharing case, over a 
field Z p where p is a prime number). The operations 
that we want to compute, however, might be defined 
over the Real numbers. Working in a field is sufficient 
for computing additions or multiplications of integers, 
if we know that the size of the field is larger than the 
maximum result of the operation. If the basic elements 
we work with are Real numbers, we can round them 
first to the next integer, or, alternatively, first multiply 
them by some constant c (say, c = 10 6 ) and then round 
the result to the closest integer. (This essentially means 
that we work with accuracy of 1 /c if the computation 
involves only additions, or an accuracy of l/c d if the 
computation involves summands composed of up to d 
multiplications.) 

4.6 Discussion 

Handling division operations. Handling division is much 
harder, since we are essentially limited to working with 
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integer numbers. One possible workaround is possible 
if we know in advance that a number x might have to 
be divided by a different number from a set D (say, 
the numbers in the range [1, 100]). In that case we first 
multiply x by the least common multiple (1cm) of the 
numbers in D. This initial step ensures that dividing 
the result by a number from D results in an integer 
number. 

An optimization. It is possible to optimize the SSS con- 
struction in the case that the degree of the polynomial 
used in the vicinity of node i equals the number of 
node i's neighbors (di — 1 = |A^|). In this case it is 
possible to avoid the polynomial evaluation and inter- 
polation. This is done by replacing steps [51, S2] with 
the following operations: when a node j would like to 
send a scalar message rriji to node i, it will select di 
numbers at random, such that their sum in an appropri- 
ate finite field is rriji . Steps [53 — 55] remain the same. 
In step [56] instead of node i performing a polyno- 
mial interpolation the values it received, it simply ag- 
gregates them using summation to obtain the weighted 
sum J^jeN a ji m ji- The drawback of this method is 
that there is no redundancy in the received parts, and 
as a result even a single neighbor that does not sent its 
share to node i can prevent node i from completing its 
computation. 

Optimization of Lagrange interpolation. Each node must 
compute, in every step of the protocol, a Lagrange in- 
terpolation of the shares it receives. Namely, node i 
which receives shares from nodes u%, . . . , must com- 
pute the free coefficient of the corresponding polyno- 
mial. This is done by computing the Lagrange inter- 
polation formula YljLi <^jP( u j)> where P(uj) is the 
share received from node Uj, and Xj is the correspond- 
ing Lagrange coefficient which is defined as 

, _ IIlKkKdi: k^j k 

3 — 7 r ■ 

IIiKkKdi; k^j (« - j) 

Note that the computation of Xj involves many mul- 
tiplications, but it only depends the identities of i's 
neighbors, rather than on the values of their shares. 
Therefore, node i can precompute the Lagrange coeffi- 
cients, and later use them to compute the linear combi- 
nation ^jP( u j)> °f me shares it receives. This step 
considerably reduces the online overhead of i's opera- 
tion. 

Using a single polynomial for implementing broad- 
cast. Assume a setting in which node j needs to broad- 
cast the same value rrij to all its neighbors. (This as- 
sumption does not hold in general, however there are 



special cases where it does hold, for example the Ja- 
cobi algorithm described in Section l7Tl ) 

In the SSS method described above, node j needs 
to construct a different polynomial Pj for each neigh- 
bor i, encode rrij as the free coefficient of Pj, and send 
shares of Pj to the neighbors of node i. For implement- 
ing broadcast, node j can generate a single polynomial 
whose free coefficient is rrij, and send values of this 
polynomial to the neighbors of each of its neighbors. 
This is possible if there is an upper bound of d^> — 1 
on the number of colluders among the second degree 
neighbors of j (i.e., among neighbors of j's neigh- 
bors), and it also holds that no neighbor of j has less 
than cfw neighbors. In that case node j sets the de- 
gree of the polynomial to d^ — 1, and sends a share 
of this polynomial to each neighbor of its neighbor. 
(Setting the degree to this value enables each neighbor 
to interpolate the messages sent to it, yet prevents the 
colluders from learning illegitimate information.) 

An obvious advantage of this approach is that j 
needs to send a single share to each neighbor u of its 
neighbors, even if u happens to be a neighbor of two 
or more of j's neighbors. In the previous SSS based 
method, node j needed to send to u a different share 
for every node i which is a joint neighbor of j and u. 

Consider now the aggregation operation that is per- 
formed by u, in which u computes a linear combina- 
tion of all shares which are destined to i. These shares 
are values of different polynomials (generated by dif- 
ferent neighbors of i), but the requirement above en- 
sures that each of these polynomials, say polynomial 
Pji generated by node f, is of degree that is at most 
dU ) _ i xhis value is smaller than di, the number 
of neighbors of i. Therefore the linear combination 
of these polynomials is a polynomial whose degree is 
smaller than di. Node i receives di shares of this poly- 
nomial and can therefore interpolate it. 

Collusion of distant nodes in the graph. The privacy 
of the data that node j sends to node i, encoded based 
on Shamir's secret sharing using a polynomial of de- 
gree d — 1, is preserved as long as an adversary does 
not get hold of d shares. Therefore, if j uses a different 
polynomial for encoding the messages sent to each of 
its neighbors (i.e., it does not use the method discussed 
in the previous paragraph), then it only needs to care 
about collusions between members of the set of neigh- 
bors of each of its neighbors separately. (E.g., if j has 
8 neighbors, and it is known that for each neighbor i it 
holds that no more than 3 of i's neighbors collude, then 
j can encode its messages using a different polynomial 
of degree 3 for each of its neighbors. This encoding is 
secure even if j sends the same message rrij to all its 
neighbors, and even if the total number of colluders is 
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much larger than 3, since each choice of the free coeffi- 
cients of the polynomials is plausible given the values 
known to the colluders.) 

If j uses a single polynomial to encode the mes- 
sages it sends to all its neighbors, as is detailed in the 
previous paragraph, then it must make sure that the de- 
gree of the polynomial is at least as large as the poten- 
tial number of colluders between all neighbors of its 
neighbors. It does not have to care about the integrity 
of other members of the network. 

5 Cryptographic background for the malicious 
adversary model 

In this section, we describe two cryptographic prim- 
itives that are used in our construction for the mali- 
cious adversary model. In this section we give a brief 
review of those primitives, while in Section [6] we ex- 
plain how those primitives are used in the context of 
secure multi-party computation. 

5.1 Pedersen VSS 

Pedersen [29 1 presents a non-interactive verifiable se- 
cret sharing scheme (VSS). In this scheme, each party 
can verify that he received a correct share, without 
communicating with any other party. 

Pedersen VSS is based on the usage of a commit- 
ment scheme which was also designed by Pedersen. 
(A commitment scheme enables a committer to com- 
mit to a value without revealing it. Later the committer 
can reveal that value. Other parties are assured that the 
committer was not able to change the committed value 
after the commitment was generated.) The commit- 
ment scheme is based on the assumption that the dis- 
crete logarithm problem is hard in a certain group. The 
commitment scheme operates in the following way: 
Two generators g and h of the group are chosen at ran- 
dom. (The discrete logarithm assumption therefore im- 
plies that computing log (h) is infeasible.) In order to 
commit to a value s , the committer randomly chooses 
a value t and computes the commitment E(s,t) = 
g s h} . Then, in order to open the commitment, the com- 
mitter reveals s and t. 

It was proven that the committer cannot change 
her mind after generating the commitment. This was 
proved by showing that a committer that can change 
the committed value from s to a different value s' can 
compute \og g (h). Pedersen then showed how to use 
this primitive to share a secret s between n parties in 
a way that enables them to verify that their shares are 
consistent. We describe this protocol below, where the 



dealer D plays the role of the committer of the basic 
commitment scheme. 

[ VS 1 ] D performs the basic commitment scheme and com- 
putes a commitment of the secret s: Eq = E(s, t) . 

[VS2] D performs the first step of Shamir secret sharing 
scheme: D randomly chooses a polynomial P(x) 
of degree at most d — 1 subject to the constraint 
P(0) = s and computes P(i) for i = 1, n (we 
denote P(x) = J2k=oPk xk > therefore p = s). 

[VS3] D randomly chooses a polynomial R(x) of degree 
at most d— 1 subject to the constraint R(0) — t and 
computes R(i) for i = 1, n (we denote R(x) — 
J2k=o r kX k ; therefore r = t .) 

[VS4] D performs the second step of Shamir secret shar- 
ing scheme: D secretly sends (P(i), R(i)), the i'th 
share, to party i, for i = 1, n. 

[VS5] D computes and broadcasts a commitment to P(x)'s 
coefficients po, ...,pg_x using i?(x)'s coefficients 
ro, Td-i- I.e., D broadcasts Ej = E(pj,rj) for 
j = 0, ...d - 1. Denote E = (E , E 2 , ■ ■ ■ , 
the set of all commitments. 

[VS6] Party i can now verify that the share (P(i), R(i)) 
that it received is correct. This is done by verifying 
the equation E(P(i), R(i)) = \[' ,| ;/•.,)' . 

Note that when the parties send their shares to the party 
who is supposed to combine them in order to compute 
the secret, the verification data Eq, . . . , Ed-i can be 
used to verify the correctness of the received shares. 



5.2 Byzantine agreement 

The Byzantine agreement (Byzantine Generals) prob- 
lem was first introduced by Pease, Shostak and Lam- 
port 1 23). It is now considered as a fundamental prob- 
lem in fault-tolerant distributed computing. The task is 
to reach agreement in a set of n nodes in which up- 
to / nodes may be faulty. A distinguished node (the 
General or the initiator) broadcasts a value m, follow- 
ing which all nodes exchange messages until the non- 
faulty nodes agree upon the same value. If the initiator 
is non-faulty then all non-faulty nodes are required to 
agree on the same value that the initiator sent. 

On-going faults whose nature is not predictable or 
that express complex behavior are most suitably ad- 
dressed in the Byzantine fault model. It is the preferred 
fault model in order to seal off unexpected behavior 
within limitations on the number of concurrent faults. 
With respect to the bounds on redundancy, the Byzan- 
tine agreement problem has been shown to have no de- 
terministic solution if more than n/3 of the nodes are 
concurrently faulty 1 24 1 . 
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A Byzantine Agreement protocol satisfies the fol- 
lowing typical properties: 

Agreement: The protocol returns the same value at all 
correct nodes; 

Validity: If the General is correct, then all the correct 
nodes return the value sent by the General; 
Termination: The protocol terminates in a finite time. 

Standard deterministic Byzantine agreement algo- 
rithms operate in the synchronous network model in 
which it is assumed that all correct nodes initialize 
the agreement procedure and can exchange messages 
within a round. In the context of our paper one can use 
the modular solution appearing in [32|. That protocol 
requires 2f + 1 rounds of communication and 0(nf 2 ) 
messages, where / is the bound on the number of ma- 
licious nodes. The protocol can be invoked by each 
node that wants to broadcast a message. The set of par- 
ticipating nodes are the General's direct neighbors. In 
some cases we may define a larger set, a set that con- 
tains some neighbors of neighbors. It is assumed that 
every pair of nodes in the set can exchange messages. 
For that to hold we assume that the network connectiv- 
ity among the members of the set will be at least 2/+ 1 
(see G6)). 

Byzantine agreements are guaranteed to provide 
the same value at all participating non-faulty nodes. 
If the general is faulty that value may turn out to be 
some default value. In the context of our paper, since 
we carry out computations based of the values sent by 
all the neighbors of a node we assume that the default 
value is a zero. In the context of the current paper we 
also assume that for each node i, di > f. 



6 Extending the construction to support malicious 
adversaries 

In this section we extend the SSS protocol of Sec- 
tion 14.31 to defend against malicious peers. The new 
protocol utilizes the mechanisms described in the pre- 
vious section. Before presenting the full protocol, we 
devise a modified VSS scheme. This scheme is needed, 
since the original VSS relies on a broadcast primitive, 
which does not exist in a Peer-to-Peer network. The 
modified protocol is following Pedersen's VSS scheme, 
except of step [VS5] which is replaced by a Byzantine 
agreement. (Byzantine agreement is used in order to 
ensure that all relevant nodes receive the same veri- 
fication information, i.e. exponents of coefficients, of 
Pedersen's protocol.) 



6.1 Modified VSS 

In each round of the algorithm, when a node j would 
like to send a scalar value rriji to node i it does the 
following operations: 

[M V 1 ] Node j perform steps [ VS 1 - VS4] in Pedersen's scheme 
for creating verifiable shares of the message rriji 
and sends the shares to node i's neighbors. 

[MV2] Node j runs a Byzantine agreement protocol be- 
tween j and all of node i's neighbors (including 
i itself), in which node j broadcasts the set E of 
commitments to the coefficients of the polynomi- 
als. (We denote this set as E"^ later in the proto- 
col.) This step replaces the broadcast primitive in 
[VS5] that does not exist in a Peer-to-Peer network. 

[MV3] Node i's neighbors verify the validity of the shares 
using [ VS6] . In case the share received by neighbor 
I is not valid, this neighbor informs node i about 
that and does not send it the required linear combi- 
nation of shares. 



6.2 The full protocol 
Initialization: 

[BSO] We assume that the coefficients ciji are known in 
advance to all neighbors of i, (If that is not the case, 
then these coefficients are decided by either node i 
or j. That node must perform with node i's neigh- 
bors a Byzantine agreement for the value a^.) 

This step is needed because node i's neighbors will 
verify the weighted sum computation it will carry out 
later. 



In each round of the algorithm, when a node j would 
like to send a scalar value rriji to node i it does the 
following operations: 

[BS1] Protocol [MV1-3] is executed for sending and ver- 
ifying the shares of the message rriji. (This step 
is only executed in the first round of the proto- 
col. In later rounds it is replaced by Step [BS6] in 
which the neighbors also verify that rriji is indeed 
the message that j was supposed to send according 
to the protocol.) 
[BS2] Each neighbor I of node i that validated all the 
shares of all the neighbors of node i aggregates 
the shares it received from all these neighbors us- 
ing linear coefficients {aji | j S A^}, and com- 
putes the value Su = J^jeN- a jiPji(0> an d tne 



value T u = £ 



iRji(l). (This computation 



j£Ni a 3 l 

computes the values of two polynomials of degree 
di — 1, whose free coefficients are equal to the 
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weighted sum of all messages rriji sent to node i 
by its neighbors, and of all values tji that are used 
for computing the corresponding commitments.) 

[BS3] Each neighbor I sends the sums Su,Tu to node i. 

[BS4] Node i first verifies the values it received from its 
neighbors: Each neighbor must essentially send val- 
ues of polynomials generated as linear combina- 
tions of the polynomials Pji , Rji of all neighbors j 
of i, for which commitments has been sent. There- 
fore a linear combination of the commitments can 
be used to verify the values received from the neigh- 
bors. In more detail, let E^) = (E [ j l) E%*\) 
be the commitments sent by node j with respect 
to the message mji. For every neighbor I which 
sent to i the values Su,Tu, node i verifies that 
E(S U ,T H ) = n d k z]{E^) l \ where £?« is equal 

to n jeNi (E^p*. 

[BS5] If di or more messages sent from neighbors were 
verified correctly, node i considers these Su values 
as values of a polynomial Pi of degree di — 1. It 
interpolates Pi (x) in order to extract the free coef- 
ficient, which in this case is the weighted sum of 
messages sent to it, ^2j eN . a^mji. 

[BS6] Note that the weighted sum ml = J2j eN . ajitriji 
which was calculated by i is included in the mes- 
sage that i must send to its neighbors in the next 
round of the protocol. (Namely, i must send to ev- 
ery neighbor i' of his the message mw = awm'.) 
In addition, we will require that the t' value that i 
uses in a VSS for computing E' — g m hi will be 
Sjgtv a jitji> namely be equal to the linear com- 
bination of the t values in the messages sent to i. 
The neighbors of i must therefore verify that they 
receive shares of ml . This is done in the follow- 
ing way: Each neighbor i' first computes a linear 
combination of the free coefficient of the Pedersen 
commitments (of messages destined to i) that it re- 
ceived in the last step: E§> = n jeNi (Etf i) ) a * i ■ 
Node i runs a modified VSS protocol where it com- 
mits to m! using the value t' and polynomials P'Q 
and R'O of the required degrees. The result is a 
vector E' whose first entry is E^ as defined above. 
Node i runs steps [MV1-MV3] with these values, 
sending them to all its neighbors. Each of the neigh- 

(i) 

bors verifies that the first entry of E is indeed Eq ' . 
Later in the execution of this step, each node %' 
verifies the linear combination it receives from its 
neighbors using the values {E^ | i G N^}. If a 
verification by a node fails, it aborts the protocol 
and notifies its neighbors. 



6.3 Protocol analysis 

The extended protocol complexity is higher than the 
SSS protocol for the semi-honest model. Below we 
present an analysis of the efficiency of the extended 
protocol in terms of computational and message over- 
head. 

First we list the computational and message over- 
head of the building blocks of the protocol, and then 
we sum them. 

- Byzantine Agreement: A Byzantine agreement of 
\Ni \ nodes of which at most d nodes are malicious 
consists of \Ni\d 2 messages in 2d + 3 communica- 
tion rounds. 

- Polynomial Creation: Creating a random polyno- 
mial of degree d — 1 costs 0(d) random number 
generation operations; this computational overhead 
is negligible in the overall computational overhead. 

- Polynomial Evaluation: Evaluating of a polyno- 
mial of degree d — 1 costs 0(d 2 ) multiplication 
operations. 

- Values Verification: Verifying of a value (a share 
or a weighted sum) costs 0(d) exponentiation op- 
erations. 

- Polynomial Interpolation: Interpolating a poly- 
nomial of degree d—1 costs 0(d 2 ) multiplication 
operations. 

Message overhead: The dominant element regarding 
message overhead is the Byzantine agreement protocol 
(Steps [BSO], [BSl], [BS6]) that requires |cf 2 mes- 
sages in 2d + 3 communication rounds. 

Computational overhead The dominant element re- 
garding computational overhead is the values verifica- 
tions (Steps [BSl], [BS4], [BS6]) that all together cost 
0(\Ni\d) exponentiation operations. 

There are some means to minimize the number of 
computing operations: 

- Polynomial evaluation optimization. Polynomial 
evaluation, that normally takes ( d ~£> d multiplica- 
tions, can be optimize to take d multiplications, if 
the value of the input parameter is bounded in a 
known range and all of d exponentiations of all 
the possible values are prepared ahead, e.g., in the 
initialization step (this implicitly requires knowing 
d ahead). This decreases the number of comput- 
ing operations in Steps [BSl] and [BS2]. A simi- 
lar idea can be implemented in the verification in 
Steps [BSl], [BS4],and [BS6]. 

- Commitments calculations optimization. The com- 
mitments that are ag reed on Step [MV2] E^ 1 ) = 
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(Eq 1 \ . . . , Ej_l) can be computed ahead if the 
polynomials coefficients are bounded in a known 
range. This can be done by preparing ahead com- 
mitments E x>y — E(x,y) — g x h v for each x,y 
that are in the bounded range. This decreases the 
number of computing operations in Step [MV2]. 

Security. We argue here that either every party fol- 
lows the protocol, or the protocol aborts. Every modi- 
fied VSS protocol uses Byzantine agreement to broad- 
cast the verification data E. Therefore either all honest 
nodes receive shares corresponding to the same poly- 
nomial, or the protocol aborts. Furthermore, the linear 
combinations of these shares that are sent to node i can 
be verified by the same data, and therefore no neighbor 
of i can send a corrupt linear combination. Finally, in 
the next step of the protocol node i must send a linear 
combination of the messages it received. This is ver- 
ified by its neighbors, by using the same verification 
data, as is detailed in Step [BS6] of the protocol. 

For using the Byzantine agreement protocol, we 
demand that /, the number of malicious peer in each 
vicinity is less than 1 /3 of the peers in that vicinity. Al- 
ternatively, a Byzantine agreement with signatures can 
be used, tolerating any number of malicious nodes. 



6.4 Discussion 

A more realistic model. The Shamir Secret Sharing 
protocol makes sure that one party's share is not re- 
vealed by other parties, unless at least t parties coop- 
erate (£ — 1 is the degree of the polynomial). However, 
a model on which most of the parties are insensitive to 
their privacy is realistic. In such a model, there is no 
reason to make an effort in order to prevent revealing 
shares of these insensitive nodes. This observation can 
be refined by setting a "paranoic coefficient" for each 
node that describes the extent of privacy-sensitivity of 
this node. As the "paranoic coefficient" decreases, so 
does the degree of the polynomial. In particular, insen- 
sitive nodes can use a polynomial of degree 0. 

Synchronous vs. Asynchronous execution. In this pa- 
per, we have presented the iterative algorithm that com- 
putes weighted sums as a synchronous algorithm which 
operates in rounds. This was mainly done for simplify- 
ing our exposition. However, in practice it is not valid 
to assume that the clocks and message delays are syn- 
chronized in a large Peer-to-Peer network. Luckily, it 
is known that linear iterative algorithms such as the 
Jacobi algorithm converge in asynchronous settings as 



well. Specifically, the Gauss Seidel algorithm is an asyn- 
chronous version of the Jacobi algorithm which typi- 
cally converges faster [ 10 1. 

An optimization to the Byzantine agreement protocol. 
It is possible to optimize the Byzantine agreement by 
using a Public Key infrastructure that enables signa- 
tures. The existence of the Public Key infrastructure 
limits the ability of the malicious nodes to introduce 
undetected superfluous messages. Algorithms that reach 
Byzantine agreement under such assumptions require 
sending only a constant number of messages of each 
node to all participating nodes 1T71 . Moreover, such 
protocols can overcome any ratio of faulty to correct 
nodes. 

Another optimization is to replace the determinis- 
tic protocol with a probabilistic one (cf. [8 , 19, 13 |). A 
typical probabilistic protocol terminates in an expected 
constant number of rounds. The drawback is that such 
protocols do not guarantee that all non-faulty nodes 
complete the protocol at the same time. In our context 
this implies that a node waits a round before sending 
messages based on the agreement, to ensure that oth- 
ers also completed the protocol. Our protocol requires 
running several agreements in concurrently. The asyn- 
chronous nature of the execution allow nodes to use 
each value once the agreement about it is completed. 



7 Case Study: neighborhood based collaborative 
filtering 

To demonstrate the usefulness of our approach, we give 
a specific instance of a problem our framework can 
solve, preserving users' privacy. Our chosen example 
is in the field of collaborative filtering. We have chosen 
to implement the neighborhood based collaborative fil- 
tering algorithm, a state-of-the-art algorithm, winner 
of the Netflix progress prize of 2007. When adapt- 
ing this algorithm to a Peer-to-Peer network, there are 
two main challenges: first, the algorithm is centralized, 
while we would like to distribute it, without losing ac- 
curacy of the computed result. Second, we would like 
to add a privacy preserving layer, which prevents the 
computing nodes from learning any information about 
neighboring nodes or other nodes rating, except of the 
computed solution. 

We first describe the centralized version, and later 
we extend it to be computed in a Peer-to-Peer network. 
Given a possibly sparse user ratings matrix R mx „, 
where m is the number of users and n is the number of 
items, each user likes to compute an output ratings for 
all the items. 
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In the neighborhood based approach [6|, the out- 
put rating is computed using a weighted average of the 
neighboring peers: 

j&N, 

Our goal is to find the weights matrix W where 
signifies the weight node i assigns node j. 

We define the following least square minimization 
problem for user i : 

The optimal solution is formed by differentiation 
and solution of a linear systems of equations Rw = b. 
The optimal weights (for each user) are given by: 

w = (R T R)- 1 R T b . (2) 

We would like to distribute the neighborhood based 
collaborative filtering problem to be computed in a Peer- 
to-Peer network. Each peer has its own rating as input 
(the matching row of the matrix R) and the goal is 
to compute locally, using interaction with neighboring 
nodes, the weight matrix W, where each node has the 
matching row in this matrix. Furthermore, the peers 
would like to keep their input rating private, where no 
information is leaked during the computation to neigh- 
boring or other nodes. The peers will obtain only their 
matching output rating as a result of this computation. 

We propose a secure multi -party computation frame- 
work, to solve the collaborative filtering problem ef- 
ficiently and distributively, preserving users' privacy. 
The computation does not reveal any information about 
users' prior ratings, nor on the computed results. 

7.1 The Jacobi algorithm for solving systems of linear 
equations 

In this section we give an example of one of the sim- 
plest iterative algorithms for solving systems of lin- 
ear equations, the Jacobi algorithm. This will serve as 
an example for an algorithm our framework is able to 
compute, for solving the neighborhood based collab- 
orative filtering problem. Note that there are numer- 
ous numerical methods we can compute securely using 
our framework, among them Gauss Seidel, EM (ex- 
pectation minimization), Conjugate gradient, gradient 
descent, Belief Propagation, Cholskey decomposition, 
principal component analysis, SVD etc. 

Given a system of linear equations Ax = b, where 
A is a matrix of size nx n, ^ and b E K n , the 



Jacobi algorithm iflOl starts from an initial guess x°, 
and iterates: 

x r = h-Y^eN^T 1 (3) 

The Jacobi algorithm is easily distributed since ini- 
tially each node selects an initial guess xf, and the 
values x r j are sent among neighbors. A sufficient con- 
dition for the algorithm convergence is when the spec- 
tral radius p(I — Z3 _1 A) < 1, where / is the identity 
matrix and D = diag(A). This algorithm is known 
to work in asynchronous settings as well. In practice, 
when converging, the Jacobi algorithm convergence speed 
is logarithmic^ in n. 

Our goal is to compute a privacy-preserving ver- 
sion of the Jacobi algorithm, where the inputs of the 
nodes are private, and no information is leaked during 
the rounds of the computation. 

Note, that the Jacobi algorithm serves as an excel- 
lent example since its simple update rule contains all 
the basic operation we would like to support: addition, 
multiplication and substraction. Our framework sup- 
ports all of those numerical operations, thus capturing 
numerous numerical algorithms. 

7.2 Using the Jacobi algorithm for solving the 
neighborhood based collaborative filtering problem 

First, we perform a distributed preconditioning of the 
matrix R. Each node i divides its input row of the ma- 
trix R by Ru. This simple operation is done to avoid 
the division in Eq. (0), while not affecting the solution 
vector w. 

Second, since Jacobi algorithm's input is a square 
n x n matrix, and our rating matrix R is of size m x 
n, we use the following "trick": We construct a new 
symmetric data matrix R based on the non-rectangular 
rating matrix R e ]R mx ™ 

A R T ^ g ]g> (>n+n) X (m+n) _ ^ 

Additionally, we define a new vector of variables w = 
{w T ,z T } T e R0 n+n )* 1 , where x <= K mxl is the 
(to be shown) solution vector and z £ ]R nxl is an 
auxiliary hidden vector, and a new observation vector 

b^{o T ,b T } T ef( m +") xl . 

Now, we would like to show that solving the sym- 
metric linear system Rw = b, taking the first m en- 
tries of the corresponding solution vector w is equiva- 
lent to solving the original system Rw = b. Note that 

3 Computing the pseudo inverse solution (equation O itera- 
tively can be done more efficiently using newer algorithms, for 
example 1111 . For the purpose of the clarify of explanation, we 
use the Jacobi algorithm. 



12 



in the new construction the matrix R is still sparse, 
and has at most 2mn off-diagonal nonzero elements. 
Thus, when running the Jacobi algorithm we have at 
most 2mn messages per round. 

Writing explicitly the symmetric linear system's 
equations, we get 

w + R T z = 0, Rw = b. 

By extracting w we obtain 

w = (R T R)- 1 R T b, 

the desired solution of Eq. (0. 

8 Experimental Results 

We have implemented our proposed constructions for 
the semi-honest model using a large scale simulation. 
Our simulation is written in C, consists of about 1500 
lines of code, and uses MPI, for running the simula- 
tion in parallel. We run the simulation on a cluster of 
Linux Pentium IV computers, 2.4Ghz, with 4GB RAM 
memory. We use the open source Paillier implementa- 
tion of Q. Currently, we have implemented fully the 
semi-honest protocols. An area of future work is to im- 
plement the full protocol against the malicious partici- 
pants as well. 

We use several large topologies for demonstrating 
the applicability of our approach. The different topolo- 
gies are listed in Table [H] The DIMES dataset ED is 
an Internet router topology of around 300,000 routers 
and 2.2 million communication links connecting them, 
captured in January 2007. A subgraph of the DIMES 
dataset is shown in Figure [2] The Blog network, is a 
social network, web crawl of Internet blogs of half a 
million blog sites and eleven million links connecting 
them. Finally, the Netflix [2] movie ratings data, con- 
sists of around 500,000 users and 100,000,000 movie 
ratings. This last topology is a bipartite graph with 
users at one side, and movies at the other. This topol- 
ogy is not a Peer-to-Peer network, but relevant for the 
collaborative filtering problem. We have artificially cre- 
ated a Peer-to-Peer network, where each user is a node, 
the movies are nodes as well, and edges are the ratings 
assigned to the movies. 



Topology 


Nodes 


Edges 


Data Source 


Blogs Web Crawl 
DIMES 
Netflix 


1.5M 

337,326 
497,759 


8M 
2,249,832 
100M 


IBM 
DIMES 
Netflix 



Table 1 Topologies used for experimentation 



We ignore algorithm accuracy since this problem 
was addressed in detail in (6). We are mainly con- 
cerned with the overheads of the privacy preserving 
mechanisms. Based on the experimental results shown 
below, we conclude that the main overhead in imple- 
menting our proposed mechanisms is the computational 
overhead, since the communication latency exists any- 
way in the underlying topology, and we compare the 
run of algorithms with and without the added privacy 
mechanisms overhead. For that purpose, we ignore the 
communication latency in our simulations. This can 
be justified, because in the random perturbations and 
homomorphic encryption schemes, we do not change 
the number of communication rounds, so the commu- 
nication latency remains the same with or without the 
added privacy preserving mechanisms. In the SSS scheme, 
we double the number of communication rounds, so 
the incurred latency is doubled as well. 

Table [2] compares the running times of the basic 
operations in the three schemes. Each operation was 
repeated 100,000 times and an average is given. As 
expected the heaviest computation is the Paillier asym- 
metric encryption, with a security parameter of 2,048 
bits. It can be easily verified, that while the SSS basic 
operation takes around tens of microseconds, the Pail- 
lier basic operations takes fractions of seconds (except 
of the homomorphic multiplication which is quite ef- 
ficient since it does not involve exponentiation). In a 
Peer-to-Peer network, when a peer has likely tens of 
connections, sending encrypted message to all of them 
will take several seconds. Furthermore, this time esti- 
mation assumes that the values sent by the function are 
scalars. In the vector case, the operation will be much 
slower. 

Table [3] outlines the running time needed to run 
8 iterations of the Jacobi algorithm, on the different 
topologies. Four modes of operations are listed: no pri- 
vacy preserving means we run the algorithm without 
adding any privacy layer for baseline timing compari- 
son. Next, our three proposed schemes are shown. 

In the Netflix dataset, we had to use eight comput- 
ing nodes in parallel, because our simulation memory 
requirement could not fit into one processor. 

As clearly shown in Table [3] our SSS scheme has 
significantly reduced computation overhead relative to 
the homomorphic encryption scheme, while having an 
equivalent level of security (assuming that the Paillier 
encryption is semantically secure). In a Peer-to-Peer 
network, with tens of neighbors, the homomorphic en- 
cryption scheme incurs a high overhead on the com- 
puting nodes. 
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Scheme 


Operation 


Time (micro second) 


Msg size (bytes) 


Random perturbation 


Adding noise 
Receiver operation 


0.0783745 


8 


SSS 


Polynomial generation and evaluation 
Polynomial extrapolation 


11.18382125 
6.13709025 


8 


Paillier 


Key generation 
Encryption 
Decryption 
Multiplication 


5016199.4 
203478.62 
193537.97 
99.063958 


2048 



Table 2 Running time of local operations. As expected, the Paillier cryptosystem basic operations are time consuming relative to the 
SSS scheme. 



Topology 


Scheme 


Time (HH:MM:SS) 


computing nodes 


DIMES 


None 

Random Perturbations 

SSS 

Paillier 


0:33.36 
0:35.27 
10:53.44 
28:44:24.00 




Blogs 


None 

Random Perturbations 

SSS 

Paillier 


1:28.16 
1:34.85 
38:00.24 
101:52:00.00 




Netflix 


None 

Random Perturbations 

SSS 

Paillier 


5:31.14 
5:54.69 
21:40.00 


8 
8 
8 



Table 3 Running time of eight iterations of the Jacobi algorithm. The baseline timing is compared to running without any privacy 
preserving mechanisms added. Empirical results show that computation time of the homomorphic scheme is a factor of about 1,350 
times slower then the SSS scheme. 




Fig. 2 DIMES Internet router topology consisting around 300K 
routers and 2.2M communication links. A subgraph containing 
500 nodes is shown. 

9 Conclusion and Future Work 

As demonstrated by the experimental results section, 
we have shown that the secret sharing scheme in the 
semi-honest model has the lowest computation over- 
head relative to the other schemes. Furthermore, this 
scheme does not involve a trusted third party, as needed 
by the homomorphic encryption scheme for the thresh- 
old key generation phase. The size of the messages 
sent using this method is about the same as in the origi- 
nal method, unlike the homomorphic encryption which 
significantly increases message sizes. However, the draw- 



back of this scheme is that neighboring nodes to node 
i need to communicate directly between themselves 
(and each message sent to node i needs to be converted 
to messages sent to all its neighbors). In Peer-to-Peer 
systems with locality property it might be reasonable 
to assume that communication between the neighbors 
of node i is possible. (There is a way to circumvent this 
requirement, by adding asymmetric encryption. Each 
node will have a public key, where message destined 
to this node are encrypted using its public key. That 
way if node j needs to send a message to node I, it can 
ask node i do deliver it, while ensuring that node i does 
not learn the content of the message. We identify this 
extension to our scheme as an area for future work.) 

In the current work, we have extended the SSS 
protocol to defend against malicious participants. We 
have shown that the extension provides a completely 
secure solution. However, the main drawback is the 
high protocol overhead, since we need to perform mul- 
tiple Byzantine agreement protocols and commitments, 
for verifying every single computation done in the net- 
work. An area of future work is to bridge between our 
theoretical work for the malicious case to a practical 
deployment in a Peer-to-Peer network. 

Another area of future work is the extension of the 
homomorphic protocol to support malicious partici- 
pants. One possible approach is to utilize the threshold 
Paillier cryptosystem supports verification keys |20], 
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that enables participants to verify validity of encrypted 
messages. 
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